AITopics | optimistic exploration

Optimistic Exploration in Reinforcement Learning Using Symbolic Model Estimates

Neural Information Processing SystemsDec-25-2025, 22:32:55 GMT

There has been an increasing interest in using symbolic models along with reinforcement learning (RL) problems, where these coarser abstract models are used as a way to provide RL agents with higher level guidance. However, most of these works are inherently limited by their assumption of having an access to a symbolic approximation of the underlying problem. To address this issue, we introduce a new method for learning optimistic symbolic approximations of the underlying world model. We will see how these representations, coupled with fast diverse planners developed by the automated planning community, provide us with a new paradigm for optimistic exploration in sparse reward settings. We investigate the possibility of speeding up the learning process by generalizing learned model dynamics across similar actions with minimal human input. Finally, we evaluate the method, by testing it on multiple benchmark domains and compare it with other RL strategies.

name change, optimistic exploration, reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.61)

Add feedback

Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning

Neural Information Processing SystemsDec-24-2025, 06:48:19 GMT

Reinforcement learning (RL) is a general framework for modeling sequential decision making problems, at the core of which lies the dilemma of exploitation and exploration. An agent failing to explore systematically will inevitably fail to learn efficiently. Optimism in the face of uncertainty (OFU) is a conventionally successful strategy for efficient exploration. An agent following the OFU principle explores actively and efficiently. However, when applied to model-based RL, it involves specifying a confidence set of the underlying model and solving a series of nonlinear constrained optimization, which can be computationally intractable. This paper proposes an algorithm, Bayesian optimistic optimization (BOO), which adopts a dynamic weighting technique for enforcing the constraint rather than explicitly solving a constrained optimization problem. BOO is a general algorithm proved to be sample-efficient for models in a finite-dimensional reproducing kernel Hilbert space. We also develop techniques for effective optimization and show through some simulation experiments that BOO is competitive with the existing algorithms.

bayesian optimistic optimization, model-based reinforcement learning, optimistic exploration, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

General Exploratory Bonus for Optimistic Exploration in RLHF

Li, Wendi, Oh, Changdae, Li, Sharon

arXiv.org Artificial IntelligenceDec-9-2025

Optimistic exploration is central to improving sample efficiency in reinforcement learning with human feedback, yet existing exploratory bonus methods to incentivize exploration often fail to realize optimism. We provide a theoretical analysis showing that current formulations, under KL or $α$-divergence regularization, unintentionally bias exploration toward high-probability regions of the reference model, thereby reinforcing conservative behavior instead of promoting discovery of uncertain regions. To address this pitfall, we introduce the General Exploratory Bonus (GEB), a novel theoretical framework that provably satisfies the optimism principle. GEB counteracts divergence-induced bias via reference-dependent reward regulation and unifies prior heuristic bonuses as special cases, while extending naturally across the full $α$-divergence family. Empirically, GEB consistently outperforms baselines on alignment tasks across multiple divergence settings and large language model backbones. These results demonstrate that GEB offers both a principled and practical solution for optimistic exploration in RLHF.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.03269

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SOMBRL: Scalable and Optimistic Model-Based RL

Sukhija, Bhavya, Treven, Lenart, Sferrazza, Carmelo, Dörfler, Florian, Abbeel, Pieter, Krause, Andreas

arXiv.org Artificial IntelligenceNov-26-2025

We address the challenge of efficient exploration in model-based reinforcement learning (MBRL), where the system dynamics are unknown and the RL agent must learn directly from online interactions. We propose Scalable and Optimistic MBRL (SOMBRL), an approach based on the principle of optimism in the face of uncertainty. SOMBRL learns an uncertainty-aware dynamics model and greedily maximizes a weighted sum of the extrinsic reward and the agent's epistemic uncertainty. SOMBRL is compatible with any policy optimizers or planners, and under common regularity assumptions on the system, we show that SOMBRL has sublinear regret for nonlinear dynamics in the (i) finite-horizon, (ii) discounted infinite-horizon, and (iii) non-episodic settings. Additionally, SOMBRL offers a flexible and scalable solution for principled exploration. We evaluate SOMBRL on state-based and visual-control environments, where it displays strong performance across all tasks and baselines. We also evaluate SOMBRL on a dynamic RC car hardware and show SOMBRL outperforms the state-of-the-art, illustrating the benefits of principled exploration for MBRL.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2511.20066

Country: Europe (0.46)

Genre: Research Report (1.00)

Industry:

Energy (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Eluder Dimension and the Sample Complexity of Optimistic Exploration

Neural Information Processing SystemsSep-30-2025, 11:32:53 GMT

This paper considers the sample complexity of the multi-armed bandit with dependencies among the arms. Some of the most successful algorithms for this problem use the principle of optimism in the face of uncertainty to guide exploration. The clearest example of this is the class of upper confidence bound (UCB) algorithms, but recent work has shown that a simple posterior sampling algorithm, sometimes called Thompson sampling, also shares a close theoretical connection with optimistic approaches. In this paper, we develop a regret bound that holds for both classes of algorithms. This bound applies broadly and can be specialized to many model classes. It depends on a new notion we refer to as the eluder dimension, which measures the degree of dependence among action rewards. Compared to UCB algorithm regret bounds for specific model classes, our general bound matches the best available for linear models and is stronger than the best available for generalized linear models.

eluder dimension, name change, sample complexity, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

a36b598abb934e4528412e5a2127b931-Paper.pdf

Neural Information Processing SystemsSep-25-2025, 13:21:17 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe (0.93)
North America > United States (0.93)

Genre:

Research Report (0.46)
Instructional Material (0.46)

Industry: Energy > Oil & Gas > Upstream (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Neural Information Processing SystemsAug-15-2025, 13:28:11 GMT

Model-based reinforcement learning algorithms with probabilistic dynamical models are amongst the most data-efficient learning methods.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia (0.14)
North America > United States > Massachusetts > Middlesex County (0.14)
(3 more...)

Genre: Research Report (0.46)

Industry: Energy > Oil & Gas > Upstream (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Optimistic Exploration in Reinforcement Learning Using Symbolic Model Estimates

Neural Information Processing SystemsJan-19-2025, 03:25:46 GMT

There has been an increasing interest in using symbolic models along with reinforcement learning (RL) problems, where these coarser abstract models are used as a way to provide RL agents with higher level guidance. However, most of these works are inherently limited by their assumption of having an access to a symbolic approximation of the underlying problem. To address this issue, we introduce a new method for learning optimistic symbolic approximations of the underlying world model. We will see how these representations, coupled with fast diverse planners developed by the automated planning community, provide us with a new paradigm for optimistic exploration in sparse reward settings. We investigate the possibility of speeding up the learning process by generalizing learned model dynamics across similar actions with minimal human input. Finally, we evaluate the method, by testing it on multiple benchmark domains and compare it with other RL strategies.

optimistic exploration, reinforcement learning, symbolic model estimate, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.66)

Add feedback

Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning

Neural Information Processing SystemsOct-11-2024, 05:26:01 GMT

Reinforcement learning (RL) is a general framework for modeling sequential decision making problems, at the core of which lies the dilemma of exploitation and exploration. An agent failing to explore systematically will inevitably fail to learn efficiently. Optimism in the face of uncertainty (OFU) is a conventionally successful strategy for efficient exploration. An agent following the OFU principle explores actively and efficiently. However, when applied to model-based RL, it involves specifying a confidence set of the underlying model and solving a series of nonlinear constrained optimization, which can be computationally intractable.

bayesian optimistic optimization, model-based reinforcement learning, optimistic exploration, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling

Bayrooti, Jasmine, Ek, Carl Henrik, Prorok, Amanda

arXiv.org Artificial IntelligenceOct-7-2024

Learning complex robot behavior through interactions with the environment necessitates principled exploration. Effective strategies should prioritize exploring regions of the state-action space that maximize rewards, with optimistic exploration emerging as a promising direction aligned with this idea and enabling sample-efficient reinforcement learning. However, existing methods overlook a crucial aspect: the need for optimism to be informed by a belief connecting the reward and state. To address this, we propose a practical, theoretically grounded approach to optimistic exploration based on Thompson sampling. Our model structure is the first that allows for reasoning about joint uncertainty over transitions and rewards. We apply our method on a set of MuJoCo and VMAS continuous control tasks. Our experiments demonstrate that optimistic exploration significantly accelerates learning in environments with sparse rewards, action penalties, and difficult-to-explore regions. Furthermore, we provide insights into when optimism is beneficial and emphasize the critical role of model uncertainty in guiding exploration.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2410.04988

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Middle East (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

optimistic exploration

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Optimistic Exploration in Reinforcement Learning Using Symbolic Model Estimates

Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning

General Exploratory Bonus for Optimistic Exploration in RLHF

SOMBRL: Scalable and Optimistic Model-Based RL

Eluder Dimension and the Sample Complexity of Optimistic Exploration

a36b598abb934e4528412e5a2127b931-Paper.pdf

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Optimistic Exploration in Reinforcement Learning Using Symbolic Model Estimates

Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning

Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling